40 research outputs found

    Recognizing contextual valence shifters in document-level sentiment classification

    Get PDF
    Sentiment classification is an emerging research field. Due to the rich opinionated web content, people and organizations are interested in knowing others\u27 opinions, so they need an automated tool for analyzing and summarizing these opinions. One of the major tasks of sentiment classification is to classify a document (i.e. a blog, news article or review) as holding an overall positive or negative sentiment. Machine learning approaches have succeeded in achieving better results than semantic orientation approaches in document-level sentiment classification; however, they still need to take linguistic context into account, by making use of the so-called contextual valence shifters. Early research has tried to add sentiment features and contextual valence shifters to the machine learning approach to tackle this problem, but the classifier\u27s performance was low.In this study, we would like to improve the performance of document-level sentiment classification using the machine learning approach by proposing new feature sets that refine the traditional sentiment feature extraction method and take contextual valence shifters into consideration from a different perspective than the earlier research. These feature sets include: 1) a feature set consisting of 16 features for counting different categories of contextual valence shifters (intensifiers, negators and polarity shifters) as well as the frequency of words grouped according to their final (modified) polarity; and 2) another feature set consisting of the frequency of each sentiment word after modifying its prior polarity. We performed several experiments to: 1) compare our proposed feature sets with the traditional sentiment features that count the frequency of each sentiment word while disregarding its prior polarity; 2) compare our proposed feature sets after combining them with stylistic features and n-grams with traditional sentiment features combined with stylistic features and n-grams; and 3) evaluate the effectiveness of our proposed feature sets against stylistic features and n-grams by performing feature selection. The results of all the experiments show a significant improvement over the baselines, in terms of the accuracy, precision and recall, which indicate that our proposed feature sets are effective in document-level sentiment classification

    Data-driven Methods for Course Selection and Sequencing

    Get PDF
    University of Minnesota Ph.D. dissertation.May 2019. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); xiii, 115 pages.Learning analytics in higher education is an emerging research field that combines data mining, machine learning, statistics, and education on learning-related data, in order to develop methods that can improve the learning environment for learners and allow educators and administrators to be more effective. The vast amount of data available about students' interactions and their performance in classrooms has motivated researchers to analyze this data in order to gain insights about the learning environment for the ultimate goal of improving undergraduate education and student retention rates. In this thesis, we focus on the problem of course selection and sequencing, where we would like to help students make informed decisions about which courses to register for in their following terms. By analyzing the historical enrollment and grades data, this thesis studies the two main problems of course selection and sequencing, namely grade prediction and course recommendation. In addition, it analyzes the relationship between degree planning in terms of course timing and ordering and the students' GPA and time to degree. First, we focus on predicting the grades that students will obtain on future courses so that they can make informed decisions about which courses to register for in their following terms. We model the grade prediction problem as cumulative knowledge-based linear regression models that learn the courses' required and provided knowledge components and use them to estimate a student's knowledge state at each term and predict the grades that he/she can obtain on future courses. Second, we focus on improving the knowledge-based regression models we previously developed by modeling the complex interactions among prior courses using non-linear and neural attentive models, in order to have more accurate estimation of a student's knowledge state. In addition, we model the interactions between a target course, which we would like to predict its grade, and the other courses taken concurrently with it. We hypothesize that concurrently-taken courses can affect a student's performance in a target course, and thus modeling their interactions with that course should lead to better predictions. Third, we focus on analyzing the degree plans of students to gain more insights about how course timing and sequencing relate to their GPAs and time to degree. Toward this end, we define several course timing and course sequencing metrics and compare different sub-groups of students who have achieved high vs low GPA as well as sub-groups of students who have graduated on time vs over time. Fourth, we focus on improving course recommendation by recommending to each student a set of courses which he/she is prepared for and expected to perform well in. We model this problem as a grade-aware course recommendation problem, where we propose two different approaches. The first approach ranks the courses by using an objective function that differentiates between courses that are expected to increase or decrease a student's GPA. The second approach combines the grades predicted by grade prediction methods with the rankings produced by course recommendation methods to improve the final course rankings. To obtain the course rankings in both approaches, we adapted two widely-used representation learning techniques to learn the optimal temporal ordering between courses. In summary, this thesis addresses two closely related problems by: (1) developing cumulative knowledge-based regression models for grade prediction; % (2) developing context-aware non-linear and neural attentive knowledge-based models for grade prediction; % (3) analyzing degree planning and how the time when students take courses and how they sequence them relate to their GPAs and time to degree; and % (4) developing novel approaches for grade-aware course recommendation.

    Physical activity level and stroke risk in US population: A matched case-control study of 102,578 individuals

    Get PDF
    Background: Stroke has been linked to a lack of physical activity; however, the extent of the association between inactive lifestyles and stroke risk has yet to be characterized across large populations. Purpose: This study aimed to explore the association between activity-related behaviors and stroke incidence. Methods: Data from 1999 to 2018 waves of the concurrent cross-sectional National Health and Nutrition Examination Survey (NHANES) were extracted. We analyzed participants characteristics and outcomes for all participants with data on whether they had a stroke or not and assessed how different forms of physical activity affect the incidence of disease. Results: Of the 102,578 individuals included, 3851 had a history of stroke. A range of activity-related behaviors was protective against stroke, including engaging in moderate-intensity work over the last 30 days (OR = 0.8, 95% CI = 0.7-0.9; P = 0.001) and vigorous-intensity work activities over the last 30 days (OR = 0.6, 95% CI = 0.5-0.8; P \u3c 0.001), and muscle-strengthening exercises (OR = 0.6, 95% CI = 0.5-0.8; P \u3c 0.001). Conversely, more than 4 h of daily TV, video, or computer use was positively associated with the likelihood of stroke (OR = 11.7, 95% CI = 2.1-219.2; P = 0.022). Conclusion: Different types, frequencies, and intensities of physical activity were associated with reduced stroke incidence, implying that there is an option for everyone. Daily or every other day activities are more critical in reducing stroke than reducing sedentary behavior duration

    Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis

    Get PDF
    BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London

    Surgical site infection after gastrointestinal surgery in high-income, middle-income, and low-income countries: a prospective, international, multicentre cohort study

    Get PDF
    Background: Surgical site infection (SSI) is one of the most common infections associated with health care, but its importance as a global health priority is not fully understood. We quantified the burden of SSI after gastrointestinal surgery in countries in all parts of the world. Methods: This international, prospective, multicentre cohort study included consecutive patients undergoing elective or emergency gastrointestinal resection within 2-week time periods at any health-care facility in any country. Countries with participating centres were stratified into high-income, middle-income, and low-income groups according to the UN's Human Development Index (HDI). Data variables from the GlobalSurg 1 study and other studies that have been found to affect the likelihood of SSI were entered into risk adjustment models. The primary outcome measure was the 30-day SSI incidence (defined by US Centers for Disease Control and Prevention criteria for superficial and deep incisional SSI). Relationships with explanatory variables were examined using Bayesian multilevel logistic regression models. This trial is registered with ClinicalTrials.gov, number NCT02662231. Findings: Between Jan 4, 2016, and July 31, 2016, 13 265 records were submitted for analysis. 12 539 patients from 343 hospitals in 66 countries were included. 7339 (58·5%) patient were from high-HDI countries (193 hospitals in 30 countries), 3918 (31·2%) patients were from middle-HDI countries (82 hospitals in 18 countries), and 1282 (10·2%) patients were from low-HDI countries (68 hospitals in 18 countries). In total, 1538 (12·3%) patients had SSI within 30 days of surgery. The incidence of SSI varied between countries with high (691 [9·4%] of 7339 patients), middle (549 [14·0%] of 3918 patients), and low (298 [23·2%] of 1282) HDI (p < 0·001). The highest SSI incidence in each HDI group was after dirty surgery (102 [17·8%] of 574 patients in high-HDI countries; 74 [31·4%] of 236 patients in middle-HDI countries; 72 [39·8%] of 181 patients in low-HDI countries). Following risk factor adjustment, patients in low-HDI countries were at greatest risk of SSI (adjusted odds ratio 1·60, 95% credible interval 1·05–2·37; p=0·030). 132 (21·6%) of 610 patients with an SSI and a microbiology culture result had an infection that was resistant to the prophylactic antibiotic used. Resistant infections were detected in 49 (16·6%) of 295 patients in high-HDI countries, in 37 (19·8%) of 187 patients in middle-HDI countries, and in 46 (35·9%) of 128 patients in low-HDI countries (p < 0·001). Interpretation: Countries with a low HDI carry a disproportionately greater burden of SSI than countries with a middle or high HDI and might have higher rates of antibiotic resistance. In view of WHO recommendations on SSI prevention that highlight the absence of high-quality interventional research, urgent, pragmatic, randomised trials based in LMICs are needed to assess measures aiming to reduce this preventable complication

    Effects of hospital facilities on patient outcomes after cancer surgery: an international, prospective, observational study

    Get PDF
    Background Early death after cancer surgery is higher in low-income and middle-income countries (LMICs) compared with in high-income countries, yet the impact of facility characteristics on early postoperative outcomes is unknown. The aim of this study was to examine the association between hospital infrastructure, resource availability, and processes on early outcomes after cancer surgery worldwide.Methods A multimethods analysis was performed as part of the GlobalSurg 3 study-a multicentre, international, prospective cohort study of patients who had surgery for breast, colorectal, or gastric cancer. The primary outcomes were 30-day mortality and 30-day major complication rates. Potentially beneficial hospital facilities were identified by variable selection to select those associated with 30-day mortality. Adjusted outcomes were determined using generalised estimating equations to account for patient characteristics and country-income group, with population stratification by hospital.Findings Between April 1, 2018, and April 23, 2019, facility-level data were collected for 9685 patients across 238 hospitals in 66 countries (91 hospitals in 20 high-income countries; 57 hospitals in 19 upper-middle-income countries; and 90 hospitals in 27 low-income to lower-middle-income countries). The availability of five hospital facilities was inversely associated with mortality: ultrasound, CT scanner, critical care unit, opioid analgesia, and oncologist. After adjustment for case-mix and country income group, hospitals with three or fewer of these facilities (62 hospitals, 1294 patients) had higher mortality compared with those with four or five (adjusted odds ratio [OR] 3.85 [95% CI 2.58-5.75]; p&lt;0.0001), with excess mortality predominantly explained by a limited capacity to rescue following the development of major complications (63.0% vs 82.7%; OR 0.35 [0.23-0.53]; p&lt;0.0001). Across LMICs, improvements in hospital facilities would prevent one to three deaths for every 100 patients undergoing surgery for cancer.Interpretation Hospitals with higher levels of infrastructure and resources have better outcomes after cancer surgery, independent of country income. Without urgent strengthening of hospital infrastructure and resources, the reductions in cancer-associated mortality associated with improved access will not be realised

    Accounting for Language Changes over Time in Document Similarity Search

    No full text
    Given a query document, ranking the documents in a collection based on how similar they are to the query is an essential task with extensive applications. For collections that contain documents whose creation dates span several decades, this task is further complicated by the fact that the language changes over time. For example, many terms add or lose one or more senses to meet people's evolving needs. To address this problem, we present methods that take advantage of two types of information in order to account for the language change. The first is the citation network that often exists within the collection, which can be used to link related documents with significantly different creation dates (and hence different language use). The second is the changes in the usage frequency of terms that occur over time, which can indicate changes in their senses and uses. These methods utilize the above information while estimating the representation of both documents and terms within the context of non-probabilistic static and dynamic topic models. Our experiments on two real-world datasets that span more than 40 years show that our proposed methods improve the retrieval performance of existing models and that these improvements are statistically significant

    Cumulative Knowledge-based Regression Models for Next-term Grade Prediction

    No full text
    Grade prediction for courses not yet taken by students is important so as to guide them while registering for next-term courses. Moreover, it can help their advisers for designing personalized degree plans and modifying them based on the students' performance.  In this paper, we present cumulative knowledge-based regression models with different course-knowledge spaces for the task of next-term grade prediction. These models utilize historical student-course grade data as well as the information available about the courses that capture the relationships between courses in terms of the knowledge components provided by them. Our experiments on a large dataset obtained from the College of Science and Engineering at University of Minnesota show that our proposed methods achieve better performance than competing methods and that these performance gains are statistically significant
    corecore